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1. Introduction 

Fisher information ont only plays a central role in statisical inference, but also coincides 
with a natural inner product in a distribution family. It is defined as 

Je ■= J le(u) 2 p e (cj)dcj, l e (uj)p 6 (uj) = (!) 

for a probability distribution family {pe\8 G © C K} with a probability space f2. 
However, the quantum version of Fisher information cannot be uniquely determined. 
In general, there is a serious arbitrariness concerning the order among non- commutative 
observables in the quantization of products of several variables. The problem of the 
arbitrarity of the quantum version of Fisher information is due to the same reason. 
The geometrical properties of its quantum analogues have been discussed by many 
authors[|iii. 

One quantum analogue is the Kubo-Mori-Bogoljubov (KMB) Fisher information 
J p defined by 

J e := J Tr plLed-% dt, J plUpY 1 dt = ^ (2) 

for a quantum state family {p g e S(Ti.)\9 e 9}, where S(Ti) is the set of density matrixes 
on H and the Hilbert space TC corresponds to the physical system of interest M M H M . 



As proven in |Appendix B|, it can be characterized as the limit of quantum relative 
entropy, which plays an important role in several topics of quantum information 
theory, for example, quantum channel coding 011, quantum source coding [7)[f|fJ and 
quantum hypothesis testing [10||11]. Moreover, as mentioned in section [| this inner 



product is closely related to the canonical correlation of the linear response theory in 
statistical mechanics 112]. As mentioned in [Appendix A|, it appears to be the most 



natural quantum extension from an information geometrical viewpoint. Thus, one 
might expect that it is significant in quantum estimation, but its estimation-theoretical 
characterization has not been sufficiently clarified. 

Another quantum analogue is symmetric logarithmic derivative (SLD) Fisher 
information 

J e := Tr L 2 e p e , ^(L e p e + p e L e ) = -Jp (3) 



where Lq is called the symmetric logarithmic derivative ||13||. It is closely related to the 
achievable lower bound of mean square error (MSE) not only for the one-parameter case 
|H||n|][[nj, but also for the multi-parameter case |P^B| f[T| PB[ in quantum estimation. 



The difference between the two can be regarded as the difference in the order of the 
operators, and reflects the two ways of defining Fisher information for a probability 
distribution family. 

Currently, the former is closely related to the quantum information theory while 
the latter is related to the quantum estimation theory. These two inner products have 
been discussed only in separate contexts. In this paper, to clarify the difference between 
them, we introduce a large deviation viewpoint of quantum estimation as a unified 
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viewpoint, whose classical version was initiated by Bahadur [HI ] |[20|| pl| . This method 
may not be conventional in mathematical statistics, but seems a suitable setting for a 
comparison between two quantum analogues from an estimation viewpoint. This type 
of comparison was initiated by Nagaoka |^3| , and is discussed in further depth in this 
paper. Such a large deviation evaluation of quantum estimation is closely related to the 
exponent of the overflow probability of quantum universal variable-length codingP? . 



This paper is structured as follows: Before we state the main results, we review the 
classical estimation theory including Bahadur's large deviation theory, which has been 
done in section |^. After this review, we briefly outline the main results in section ^ i.e., 
the difference is characterized from three contexts. To simplify the notations, even if 
we need the Gauss notation [ ], we omit it when this does not cause confusion. Some 
proofs are very complicated and are presented in the appendixes. 

2. Review of classical estimation theory 

We review the relationship between the parameter estimation for the probability 
distribution family {pg\9 G C K} with a probability space Q and its Fisher 
information. The definition of Fisher information is given not only by ([]]), but 
also by the limit of the relative entropy (Kullback-Leibler divergence) D(p\\q) : = 
In QogpO) - log q(u))p(u) du as 

2 

J e := lim — D(p 0+e \\p e ). (4) 

e—*0 e 

These two definitions (||) and (|4]) coincide under some regularity conditions for a family. 

Next, we consider a map / from Q to Q'. Similarly to other information quantities, 
(for example Kullback divergence etc) the inequality 

Je > J' e (5) 

holds, where J' e is Fisher information of the family {pg o f _1 \9 G 0}. Inequality (|5|) is 
called the monotonicity. According to Cencov|25[], any information quantities satisfying 
(H) coincide with a constant times Fisher information Jg. 

For an estimator that is defined as a map from the data set Q to the parameter set 
O, we sometimes consider the unbiasedness condition: 

T{uj)p e {u) du = 9, \/6e 0. (6) 

n 

The MSE of any unbiased estimator T is evaluated by the following inequality (Cramer- 
Rao inequality), 

1 



{T(u)-ey Pe (u)du>-, (7) 

Jo 



1 \LU) ~ (7 N2 " 

'n 

which follows from Schwartz inequality with respect to (w.r.t.) the inner product 
(X, Y) := f n X(u)Y(u>)pg(u>) dui for variables X, Y. When the number of data 
u n := (u)i, . . . , u n ), which obeys the unknown probability pg, is sufficiently large, we 
discuss a sequence {T n } of estimators T n (uj n ). If {T n } is suitable as a sequence of 
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estimators, we can expect that it converges to the true parameter 9 in probability, i.e., 
it satisfies the weak consistency condition: 

\imp^{\T n -9\ >e} = 0, Ve > 0, V# E O. (8) 

n—*oc 

Usually, the performance of a sequence {T n } of estimators is measured by the 
speed of its convergence. As one criterion, we focus on the speed of the convergence 
in MSE. If a sequence {T n } of estimators satisfies the weak consistency condition and 
some regularity conditions, the asymptotic version of Cramer-Rao inequality, 

lim n [ (T n (u n ) - 9) 2 p n e {ui) du > -1 (9) 

holds. If it satisfies only the weak consistency condition, it is possible that it surpasses 
the bound of © at a specific subset. Such a sequence of estimators is called 
superefficient. We can reduce its error to any amount at a specific subset with the 
measure under the weak consistency condition 

As another criterion, we evaluate the decreasing rate of the tail probability: 

P({T n },0,e) := lim — \ogp n e {\T n - 9\ > e}. (10) 

n— >oo fi 



This method was initiated by Bahadur |fL9| p0| 1 21 1 , and was a much discussed topic 



among mathematical statisticians in the 1970's. From the monotonicity of the 
divergence, we can prove the inequality 

P({T n }, 9, e) < min{D{p e+ e\\pe), D(p e - £ \\po)} (11) 

for any weakly consistent sequence {T n } of estimators. Its proof is essentially given in 
our proof of Theorem |]. Since it is difficult to analyze /3({T n }, 9, e) except in the case of 
an exponential family, we focus on another quantity a({T n }, 9) : = lim e ^ -^/3({T n }, 9, e). 
For an exponential family, see [Appendix K| . Taking the limit e — > +0, we obtain the 
inequality 

«({Tn},0)<y. (12) 

If T n is the maximum likelihood estimator (MLE), the equality of (0) holds under 
some regularity conditions for the family |H| [p6|j . This type of discussion is different 
from the MSE type of discussion in deriving ([12]) from only the weak consistency 
condition. Therefore, there is no consistent superefficient estimator w.r.t. the large 
deviation evaluation. 

Indeed, we can relate the above large deviation type of discussion in the estimation 
to Stein's lemma in simple hypothesis testing as follows. In simple hypothesis testing, 
we decide whether the null hypothesis should be accepted or rejected from the data 
u n := (ui,...,u n ) which obeys an unknown probability. For the decision, we must 
define an acceptance region A n as a subset of Q n . If the null hypothesis is p and the 
alternative is q, the first error (though the true distribution is p, we reject the null 
hypothesis) probability /3i,n(A n ) and the second error (though the true distribution is 
q, we accept the null hypothesis) probability fo^An) are given by 

ft, n (40 := 1 - p n (A n ), (3 2 , n {A n ) := q n (A n ). 
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Regarding the decreasing rate of the second error probability under the constant 
constraint of the first error probability, the equation 

l im Zi logmin{/3 2in (A>)|/M^n) < e} = D(p\\q), e > (13) 

n^oo Tl 

holds (Stein's lemma). Inequality (|ll]) can be derived from this lemma. We can regard 
the large deviation type of evaluation in the estimation to be Stein's lemma in the case 
where the null hypothesis is close to the alternative one. 



3. Outline of main results 



Let us return to the quantum case. In a quantum setting, we focus two quantum 
analogues of Fisher information, KMB Fisher information Jq and SLD Fisher 
information Jq. Indeed, if the state pe is nondegenerate, SLD Lq is not uniquely 
determined. However, as is proven in [Appendix Q SLD Fisher information Jq is uniquely 



determined, i.e., it is independent of the choice of the SLD Lq. 

On the other hand, according to Chap. 7 in Amari and Nagaoka pj, Lq has another 

form 

^^T- <"> 

As is proven by using formula (|14]) in |Appendix B| , KMB Fisher information Jq 



can be characterized as the limit of the the quantum relative entropy D(p\\o-) : = 
Tr p(logp — logo - ) in the following way 

2 

J e = lim— D(pQ +e \\p e ). (15) 
e— >o e 

Moreover, in the linear response theory of statistical physics, given an equilibrium state 
p, when a variable A fluctuates with a small value 5, another variable B also is thought 
to fluctuate with a constant times 5 \ 12 1 . Its coefficient is called the canonical correlation 
and given by 

/ Tr pgiA-Trp A) pl'^B -TrpB)dt. (16) 
Jo 

Thus, KMB Fisher information Jq is thought to be more natural from a viewpoint of 
statistical physics. 

As another quantum analogue, right logarithmic derivative (RLD) Fisher 
information Jq: 

dp e 



Jq :— Tr pqLqL*q, pqLq 



<:W 



is known. When pg does not commute and pq > 0, the RLD Lq is not self-adjoint. 
Since it is not useful in the one-parameter case, we do not discuss it in this paper. Since 
the difference in definitions can be regarded as the difference in the order of operators, 
these quantum analogues coincide when all states of the family are commutative with 
each other. However, in the general case, they do not coincide and the inequality Jq > Jq 
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holds, as exemplified in section |j. Concerning some information-geometrical properties, 
see [Appendix A| . 

In the following, we consider how the roles these quantum analogues of Fisher 
information play in the parameter estimation for the state family. As is discussed in 
detail in section f|, the estimator is described by the pair of positive operator valued 
measure (POVM) M (which corresponds to the measurement and is defined in section 
fD and the map from the data set to the parameter space 0. Similarly to the classical 
case, we can define an unbiased estimator. For any unbiased estimator E, the SLD 
Cramer-Rao inequality 

V{E) > | (17) 

holds, where V(E) is the mean square error (MSE) of the estimator E. 

In an asymptotic setting, as a quantum analogue of the n-i.i.d. condition, we treat 
the quantum n-i.i.d. condition, i.e., we consider the case where the number of systems 
independently prepared in the same unknown state is sufficiently large, in section [5[ In 
this case, the measurement is denoted by a POVM M n on the composite system 7i® n and 
the state is described by the tensor product density matrix p® n . Of course, such POVMs 
include a POVM that requires quantum correlations between the respective quantum 
systems in the measurement apparatus. Similarly to the classical case, for a sequence 
E = {E n } of estimators, we can define the weak consistency condition given in (|3ll) . 
In mathematical statistics, the square root n consistency, local asymptotic minimax 
theorems and Bayesian theorem are important topics as the asymptotic theory, but it 
seems too difficult to link these quantum settings and KMB Fisher information Jg. Thus, 
in this paper, in order to compare two quantum analogues from a unified framework, 
we adopt Bahadur's large deviation theory as follows. As is discussed in section [|, we 
can similarly define the quantities j3(E,8,e),a(E,8). Similarly to (|TT|) (|P2|) , under the 
weak consistency (WC) condition, the inequalities 

P(E,9,e) < mm{D(p e+e \\p e ),D(pe-e\\pe)} 

a(E,9) <\j Q (18) 

hold. From these discussions, the bound in the large deviation type of evaluation 
seems different from the one in the MSE case. However, as mentioned in section ^, 
the inequality 

a(E,9)<^J e (19) 

holds if the sequence E satisfies the strong consistency (SC) condition introduced in 
section ^| as a stronger condition. As is mentioned in section [7|, these bounds can be 
attained in their respective senses. Therefore, roughly speaking, the difference between 
the two quantum analogues can be regarded as the difference in consistency conditions 

and can be characterized as 

1 1 
sup lim— /3(E, 9,e) = - Je 
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sup lim — /3(E, 9,e) = -Jg. 

Even if we restrict our estimators to strongly consistent ones, the difference between 
two appears as 

sup lim inf \p(M, 9, e) = — (20) 
m-.SC e 2 

lim inf ^ sup p(M,9,e) = —, (21) 

e M:SC 2 

where, for a precise statement, as expressed in section [| we need more complicated 
definitions. 

However, we should consider that the bound 4f is more meaningful for the following 
two reasons. The first reason is the fact that we can construct the sequence of estimators 
attaining the bound at all points, which is proven in section [7|. On the other hand, 
there is a sequence of estimators attaining the bound 4f at one point 9, but it cannot 
attain the bound at all points. The other reason is the naturalness of the conditions 
for deriving the bound In other words, an estimator attaining 4f- is natural, but an 
estimator attaining 4f is very irregular. Such a sequence of estimators can be regarded 
as a consistent superefficient estimator and does not satisfy regularity conditions other 
than the weak consistency condition. This type of discussion of the superefficiency is 
different from the MSE type of discussion in that any consistent superefficient estimator 
is bounded by inequality (|18|). 

To consider the difference between the two quantum analogues of Fisher information 
in more details, we must analyze how we can achieve the bound ^ is important in 
this analysis to consider the relationship between the above discussion and the quantum 
version of Stein's lemma in simple hypothesis testing. Similarly to the classical case, 
when the null hypothesis is the state p and the alternative is the state a, we evaluate 
the decreasing rate of the second error probability under the constant constraint e > 
of the first error probability. As was proven in quantum Stein's lemma, its exponential 
component is given by the quantum relative entropy D(p\\a) for any e > 0. Hiai and Petz 
IIJ constructed a sequence of tests to attain the optimal rate D(p\\a), by constructing 



the sequence {M n } of POVMs such that 



lim^(Pf n ||PD=^(p|k). (22) 



Ogawa and Nagaoka ]] IJ proved that there is no test exceeding the bound D(p\\a). It was 
proven by Hayashi J27j that by using the group representation theory, we can construct 
the POVM satisfying ( p2[ ) independently of p. For the reader's convenience, we give a 
review of this in Appendix J . As discussed in section |7.2j , this type of construction is 
useful for the construction of an estimator attaining the bound 4r at one point. Since 
the proper bound of the large deviation is 4f, we cannot regard the quantum estimation 
as the limit of the quantum Stein's lemma. 

In order to consider the properties of estimators attaining the bound 4f at one 
point from another viewpoint, we consider the restriction that makes such a construction 
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impossible. We introduce a class of estimators whose POVMs do not require a quantum 
correlation in the quantum apparatus in section |8|. In this class, we assume that the 
POVM on the /-th system is chosen from I — 1 data. We call such an estimator an 
adaptive estimator. When an adaptive estimator E satisfies the weak consistency 
condition, the inequality 

a{E,6)<h e (23) 

holds (See section ||). Similarly, we can define a class of estimators that use quantum 
correlations up to m systems. We call such an estimator an m-adaptive estimator. For 
any m-adaptive weakly consistent estimator E, inequality fl23|) holds. Therefore, it is 
impossible to construct a sequence of estimators attaining the bound 4r if we fix the 
number of systems in which we use quantum correlations. As mentioned in section 
taking limit m — > oo, we obtain 

1 „ , J, 

'Am-AWC 

where m-AWC denotes an m-adaptive weakly consistent estimator. However, as the 
third characterization of the difference between the two quantum analogues, as precisely 
mentioned in section |[| the equation 

1 „ , J, 

' M:m-ASC 

holds, where m-ASC denotes an m-adaptive strongly consistent estimator. A more 
narrow class of estimators is treated in equation fl2~5|) than in equation ([21]). Equations 
(f2~4T) and indicate that the order of limits lim m _,. 00 and lim e ^ is more crucial than 
the difference between two types of consistencies. 

Remark 1 In the estimation only of the spectrum of a density matrix in a unitary- 
invariant family, the natural inner product in the parameter space is unique and 
equals Fisher inner product in the distribution family whose element is the probability 
distribution corresponding to eigenvalues of a density matrix. In addition, the achievable 



lim lim sup —P(M,9,e) = -£, (24) 

m— »oo e^O A wn £ 2 



lim lim sup -zP(M, 9, e) = -± (25) 

e— >0m— >oo -t a £ 2 



bound is derived by Keyl and Werner [28], and coincides with the bound uniquely given 



by the above inner product. For detail, see |Appendix h 



4. Review of non-asymptotic setting in quantum estimation 

In a quantum system, in order to discuss the probability distribution which the data 
obeys, we must define a POVM. 

A POVM M is defined as a map from Borel sets of the data set Q to the set of 
bounded, self-adjoint and positive semi-definite operators, which satisfies 

M (0) = 0, M (A) =1, M ( B i) = M ( uB i) for disjoint sets. 

i 

If the state on the quantum system Ti is a density operator p and we perform a 
measurement corresponding to a POVM M on the system, the data obeys the probability 
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distribution Pf (5) := TvpM(B). If a POVM M satisfies M(B) 2 = M(B) for any Borel 
set B, M is called a projection- valued measure (PVM). The spectral measure of a self- 
adjoint operator X is a PVM, and is denoted by E(X). For 1 > A > and any POVMs 
M l and M 2 taking values in fi, the POVM B i-> \M X {B) + (1 - \)M 2 (B) is called the 
random combination of M\ and M 2 in the ratio A : 1 — A. Even if Mi's data set Q\ is 
different from M 2 's data set f2 2 , Mi and M 2 can be regarded as POVMs taking values 
in the disjoint union set fli ]J f2 2 := (fli x {1}) U (Q 2 x {2})- In this case, we can define 
a random combination of Mi and M 2 as a POVM taking values in Qi ]J f2 2 and call 
it the disjoint random combination. In this paper, we simplify the probability P^ and 
the relative entropies D(p 0o \\p 6l ) and D(P A / g ||PjJ ) to Pf , £>(# ||#i) and D M (0o||0i), 
respectively. 

In the one-parameter quantum estimation, the estimator is described by a pair 
comprising a POVM and a map from its data set to the real number set R. Since the 
POVM M o T _1 takes values in the real number set R, we can regard any estimator as 
a POVM taking values in the real number set R. In order to evaluate MSE, Helstrom 
fT3| , [Ljj] derived the SLD Cramer-Rao inequality as a quantum counterpart of Cramer- 
Rao inequality fl2"9"|). If an estimator M satisfies 



x Tr p e M{ dx) = 9, \/9 G 0, (26) 

it is called unbiased. If 9 — 9 is sufficiently small, we can obtain the following 
approximation in the neighborhood of 9q\ 



I x Tr pe M( dx) + / xTr 
Jr \ Jr 



dpe 



09 



M(dx) ] (9 - 9 )= 9 + (9 - 9 ). 

e=e 



It implies the following two conditions: 

dp e 



XTT 89 



M( dx) = 1 (27) 
e=e 



xTi P e M(dx) = 9 . (28) 

If an estimator M satisfies (|27|) and (p8|) , it is called locally unbiased at 9q. For any 
locally unbiased estimator M (at 9) , the inequality, which is called the SLD Cramer- Rao 
inequality, 

I {x- Of Tr p e M{ dx) > \- (29) 
Jr Jo 

holds. Similarly to the classical case, this inequality is derived from the Schwartz 

inequality with respect to SLD Fisher information (X\Y) := Tr p e XY 1 2 YX §. 

The equality of (53) holds when the estimator is given by the spectral decomposition 

E{?f + 9) of ?f + 9, where L e is the SLD at 9 and is defined by (§). This implies that 

SLD Fisher information Jq coincides with Fisher information at 9 of the probability 



family < P 



9 G >. The monotonicity of quantum relative entropy [pj| Q 
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gives the following evaluation of the probability family < P 6 



"0 



D U °J(9\\9 )<D(9\\9 ) 

Taking the limit 9 — > 9q, we have 

Je < Je- (30) 

In this paper, we discuss inequality fl5D| ) from the viewpoint of the large deviation type 
of evaluation of the quantum estimation. The following families are treated as simple 
examples of the one-parameter quantum state family, in the latter. 

Example 1 [One-parameter equatorial spin 1/2 system state family]: 



In this family, we calculate 



II 1 + r cos 9 r sin 9 
2 I r sin 9 1 — r cos ( 



< 9 < 2tt 



D(p e \\ Po ) = r -(l-cos9)\og\^- 
2 1 — r 

r 1 + r 

J e = - log 

2 1 — r 

Je = r\ 

Since the relations Jq = oo and J# = 1 hold in the case of r = 1, the two quantum 
analogues are completely different. 

Example 2 [One-parameter quantum Gaussian state family and half-line 
quantum Gaussian state family]: We define the boson coherent vector \a) : = 

e~^~ Y^ =0 ^fl n )> where |n) is the number vector on L 2 (R). The quantum Gaussian 



state is defined as 



/ |a)(a|e — w~ d 2 a, V0 e C. 



We call {pel 6* € M} the one-parameter quantum Gaussian state family, and call 
{pe\9 > 0(0 G M + = [0, oo))} the half-line quantum Gaussian state family. In this 
family, we can calculate 

D (pe\\pe ) = log 

J, = 21og M +1 
2 









<9 - 6» | 
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5. The bound under the weak consistency condition 

We introduce the quantum independent-identical density (i.i.d.) condition in order to 
treat an asymptotic setting. Suppose that n-independent physical systems are prepared 
in the same state p. Then, the quantum state of the composite system is described by 

p® n ®- - -®/? on H® n , 

n 

where the tensor product space Ti® n is defined by 

n 

We call this condition the quantum i.i.d. condition, which is a quantum analogue of the 
independent-identical distribution condition. In this setting, any estimator is described 
by a POVM M n on 7i® n , whose data set is M.. In this paper, we simplify P^" and 

D(V M £JP M Z) to Pf" and D M ™(0 o ||0i). The notation M x n denotes the POVM in 
which we perform the POVM M for the respective n systems. 

Definition 1 [Weak consistency condition]: A sequence of estimators M : = 
{M n } c ^ =l is called weakly consistent if 

lim ^ ln {\6-e\ >e) =0, We9,Ve>0, (31) 
where 9 is the estimated value. 

This definition means that the estimated value 9 converges to the true value 9 in 
probability, and can be regarded as the quantum extension of (H). 

Now, we focus on the exponential component of the tail probability as follows: 

P(M,9,e) := lim sup — logPf" \\9 - 9\ > e) . 

We usually discuss the following value instead of /3(M, 9, e) 

a(M, 9) := lim sup -?p(M, 9, e) (32) 

because it is too difficult to discuss /3(M,9,e). The following theorem can be proven 
from the monotonicity of the quantum relative entropy. 

Theorem 2 (Nagaokafl2|, PH) If a POVM M n on H® n satisfies the weakly 
consistent condition ft3l\), the inequalities 

(3(M,9,e) < mi{D(p 6 ,\\ Pe )\\9-9'\ < e} (33) 
a(M,9) <| (34) 

hold. 
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Even if the parameter set G is not open (e.g., the closed half-line M + := [0, oo)), this 
theorem holds. 

Proof: The monotonicity of the quantum relative entropy yields the inequality 
D(pTWpT) > Pn,e> log ^ + (1 - Pnfi .) log 

Pnfi J- — Pnfi 

for any 9' satisfying \9' — 9\ > e, where we denote the probability Pgi" | \9 — 9\ >e|by 
p n fi"- Using the inequality — (1 — p n ,e') log (1 — p n ,e) > 0, we have 

_ logPf {l^-fll >e} = jogp^ < D{pT\\pT) + hM (35) 
n n - np nfi i 

where h is the binary entropy defined by h(x) := —a; log a; — (1 — x) log(l — x). Since 
the assumption guarantees that p n> gi — > 1, the inequality 

P(M,9,e)<D(p e ,\\p e ) (36) 

holds, where we use the additivity of quantum relative entropy: 

D{pT\\pT) = nD{p ei \\ P e). 

Thus, we obtain fl3"Bp. Taking the limit e — > in inequality we obtain (B^). ■ 
As another proof, we can prove this inequality as a corollary of the quantum Stein's 
lemma PL ITII. 



6. The bound under the strong consistency condition 

As discussed in section ^, the SLD Cramer-Rao inequality guarantees that the lower 
bound of MSE is given by SLD Fisher information. Therefore, it is expected that the 
bound is connected with SLD Fisher information for large deviation. In order to discuss 
the relationship between SLD Fisher information and the bound for large deviation, we 
need another characterization with respect to the limit of the tail probability. We thus 
define 

f3{M,9,e) := liminf — logP^" {\9 - 9\ > e) 

a(M,9) := liminf —f3(M,9,e). (37) 

In the following, we attempt to link the quantity a(M, 9) with SLD Fisher 
information. For this purpose, it is suitable to focus on an information quantity that 
satisfies the additivity and the monotonicity, as in the proof of Theorem 1. Its limit 
should be SLD Fisher information. The Bures distance b(p, a) := J 2(1 — Tr \ ^/J>\fo z \) = 

vaS.Yiu, un i tar y Ti(y/p — \J~gU) (^/p — y/aU)* is known to be an information quantity 
whose limit is SLD Fisher information, as mentioned in Lemma |[ Of course, it can be 
regarded as a quantum analogue of the Hellinger distance, and satisfies the monotonicity. 
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Lemma 3 (Uhlmann ||31|| . Matsumoto ||32|| ) If there exists an SLD Lq satisfying 
(||) ; then the equation 



1 T v b 2 (p e ,p e+ c) 
-Jo = hm - 



4 



<E^0 



(3f 



holds. 



A proof of Lemma ^ is given in Appendix C . As discussed in the latter, the Bures 
distance satisfies the monotonicity. Unfortunately, the Bures distance does not satisfy 
the additivity. 

However, the quantum affinity I(p\\cr) := — 81ogTr L/^x/a" = — 81og(l — \b{p, a) 2 ) 
satisfies the additivity: 



Hp 



nl(p\\a). 



Its classical version is called affinity in the following form|33 



I(ph) = -Slog (^2 VPiV^j 



As a trivial deformation of (|38D , the equation 
HpeWpe+e 



lim 



Jo 



(39) 



(40) 



(41) 



holds. The quantum affinity satisfies the monotonicity w.r.t. any measurement M (Jozsa 
[g, Fuchs @): 



Hp\W) > i (p p m || Pf ) = "81og^ (^ M Mv^ 



The most simple proof of (f|^) is given by Fuchs |}5] who directly proved that 



Tr y / ^W^ < (v /p ] 7 Hv / PFR 



(42) 



(43) 



For the reader's convenience, a proof of (f43l) is given in Appendix D . From (39), (41) and 
([|2|), we can expect that SLD Fisher information is, in a sense, closely related to a large 
deviation type of bound. From the additivity and the monotonicity of the quantum 
affinity, we can show the following lemma. 



Lemma 4 The inequality 



4 inf (/3'(M,9,s5)+f3'(M,9 + 5,(l-s)8) )< I( Pe \\pe+s) 

{s|l>s>0} 



(44) 



holds, where we define j3'(M,9,6~) := lim e ^ +0 (3(M, 9, 6 — e). 

A proof of Lemma |] is given in |Appendix E| . However, Lemma [| cannot yield an 
inequality w.r.t. a(M, 9) under the weak consistency condition, unlike inequality (|36|) . 
Therefore, we consider a stronger condition, which is given in the following. 
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Definition 5 [Strong consistency condition]: A sequence of estimators M = 
{M"}™ =1 is called strongly consistent if the convergence of (|37| ) is uniform for the 
parameter 9 and if a(M, 9) is continuous for 9. A sequence of estimators is called 
strongly consistent at 9 if there exists a neighborhood U of 9 such that it is strongly 
consistent in U. 

The square root n consistency is familiar in the field of mathematical statistics. However, 
in the large deviation setting, this strong consistency seems more suitable than the 
square root n consistency. 

As a corollary of Lemma we have the following theorem. 

Theorem 6 Assume that there exists the SLD Lg satisfying (^). If a sequence of 
estimators M = {M™}™ =1 is strongly consistent at 9, then the inequality 

a(M, 9)<^- (45) 

holds. 

Proof: From the above assumption, for any real e > and any element 9 G O, there 
exists a sufficiently small real 5 > such that (a(M, 9) - e)e' 2 < /3'(M, 9, e'), /?'(M, 6 + 
S, e') for Ve' < 5. Therefore, inequality (|44| ) yields the relations 



2(a(M, 9) - e)5 2 = 4(«(M, 9) - e) {s 2 6 2 + (1 - s) 2 5 2 ) 

<4 inf ((3'(M,9,s8) + (3'(M,9 + 5,(1 - s)5)) < I{pg\\p e+S ). (46) 

{s|l>s>0} V— — / 

Lemma ^| and ( f4"BD guarantee P5| ) for \/9 G 6. ■ 



Remark 2 Inequality (|43|) can be regarded as a special case of the monotonicity w.r.t. 
any trace-preserving CP (completely positive) map C : S(TCi) S^H^): 

(Tr|Vpv^|) 2 < (Tr|v^V)v^)|) 2 (47) 

which is proven by Jozsa [|34j because the map p \— > can be regarded as a trace- 
preserving CP map from the C* algebra of bounded operators on Ti to the commutative 
C* algebra C(fl), where fl is the data set. 



7. Attainabilities of the bounds 

Next, we discuss the attainabilies of the two bounds Jg and Jg in their respective senses. 
In this section, we discuss the attainabilies in two cases: the first case is the one- 
parameter quantum Gaussian state family, and the second case is an arbitrary one- 
parameter finite-dimensional quantum state family that satisfies some assumptions. 



7.1. One-parameter quantum Gaussian state family 

In this subsection, we discuss the attainabilies in the one-parameter quantum Gaussian 
state family. 
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Theorem 7 In the one-parameter quantum Gaussian state family, the sequence of 
estimators M s = {M s,n }'^L 1 (defined in the following) satisfies the strong consistency 
condition and the relations 

a{M°, 9) = a(M°, 9) = J ± = =Lj. (48) 

iV + 2 

[Construction of M s ]: We perform the POVM E(Q) for all systems, where Q is the 
position operator on L 2 (R). The estimated value £ n is determined to be the mean value 
of n data. ■ 
Proof: Since the equation 

holds, we have the equation 

2(x-B) 2 

2 ' w + 1 dx. 



tt(2N+ 1) 
Thus, we obtain the equation 



which implies that 



vr(2iV + l)n 



-1 f 2 



/3(M^,^,e) = lim — logPrila-^l > e} = (49) 
n N + ^ 

Therefore, the sequence of estimators M s = {M s ' n }^ =1 attains the bound 4f and satisfies 
the strong consistency condition. ■ 

Proposition 8 In the half-line quantum Gaussian state family, the sequence of 
estimators M w = {M w,n }^L (defined in the following) satisfies the weak consistency 
condition and the strong consistency condition at R + \ {0} and the relations 

a(M« 0) = a(M» 0) = y = log U + =J , (50) 

a(M w ,6) = a(M w ,#) = = =J- T , V0 G R + \ {0}. (51) 

2 TV + 2 

This proposition indicates the significance of the uniformity of the convergence of (|3 
This proposition is proven in [Appendix G 



[Construction of M w \: We perform the following unitary evolution: 

<Xm ^ ®(n— 1) 

^Pv^®Po • 

For detail, see Appendix F . We perform the number measurement E(N) of the first 
system whose state is p^e, and let k be its data, where the number operator N is 

defined as N := ^2 n n\n)(n\. The estimated value T n is determined by T n := \ K ■ 
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Theorem 9 In the one-parameter quantum Gaussian state family, for any 6l, the 
sequence of estimators = {M^ n }^ =l (defined in the following) satisfies the weak 
consistency condition and the relations 

a{Ml ,9 1 ) = oc{Ml A) = ^ = log (l + 1 



(52) 



[Construction of M^]: We divide n systems into two groups. One consists of \/n 
systems and the other, of n — y/n systems. We perform the PVM E(Q) for every system 
in the first group. Let be the mean value in the first group, i.e., we perform the 
PVM M s,%/ ^ for the first system. At the second step, we perform the following unitary 
evolution for the second group. 

®(n— • y/n) ®(n— \/n) 



For details, see [Appendix R We perform the POVM M w,n for the system whose 



state is 



®(n— \/n 



-Oi 



the data is written as T n _^. Then, we decide the final estimated 



value 9 as 



9:=9 1 + sgn(£^ - 9 1 )T n _ VTl . 



Proof: Since 



p <?i 

r e 1 



> e 



{\ T n-^n\ >e}> 



we have 



(3(M^9 1 ) = \im — logP,/ 1 



lim 



n 



-1 



9-9 x 



> e 



n n — y/n 
As is shown in [Appendix GL we have 



logPr' W "{|^| >e}=/3(M»,0). 



/3(M W , 0) = e 2 log 1 + 



N 

which implies ([52]). Next, we prove the consistency in the case where 9 > Q\. In this 
case, it is sufficient to discuss the case where 9 — 9\ > e > 0. Since the first measurement 
M s '^ and the second one M w,n ~^ are performed independently, we obtain 



AC 



9-9 x 



> a < p 



{\T n _^ -(9- 9 1 )\ >e}+ Vf^ ~ Or < 0} . 

Proposition || guarantees that the first term goes to 0, and Theorem |7| guarantees that 
the second term goes to 0. Thus, we obtain the consistency of M^. Similarly, we can 
prove the weak consistency the case where 9 <9\. ■ 
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7.2. Finite dimensional family 

In this subsection, we treat the case where the dimension of the Hilbert space TC is k 
(finite). As for the attainability of the RHS of inequality (|45|) , we have the following 
lemma. 

Lemma 10 Let 6® be fixed in 0. Under Assumptions 1 and 2, the sequence of estimators 
Mg Q (defined in the following) satisfies the strong consistency condition at 6 (defined in 
Def. |3[) and the relation 

a(M^9 )=a(M^9 ) = ^. (53) 

[Assumption 1]: The map 9 p$ is C 1 and pg > 0. 
[Assumption 2]: The map 9 \— > Trp^-^ 2 - is injective i.e., one-to-one. 

[Construction of M|J: We perform the POVM E( -j 2 -) for all syst ems. The estimated 
value is determined to be the mean value plus 6q. ■ 
Proof of Lemma |77].- From Assumption 2, the weak consistency is satisfied. Let 5 > 
be a sufficiently small number. Define the function 

M s) := Trp.exp (*(*£- . (54) 



Since 



Je 



< oo and Trp e - = 0, we have 

-* * 2 2 P0 \J 0O J 6o ) 

When \\9 — 9q\\ is sufficiently small, the function x — > sup s (xs — log <f)g t g (s)) is continuous 
in (—5,5). Using Cramer's theorem [36], we have 

lim — logP e "° < \9 - O | > e \ = min <^ sup(es - log (f) 6 ,g (s)), sup(-es' - log<^ (s')) } 
n->oo n I J { s s > J 

for e < 5. Taking the limit e — > 0, we have 

— 1 M s ' n 

lim lim — P e /° {|6» — 6» | > e} 

. f sup s (es - \og(f)e,e {s)) sup s (-es - log<p e ,e {s)) 1 1 _ x 
= mm < lim ■ , lim — > = —c a a , 



e^0 e 2 'e^o e 2 



where 



because 



ce,6 '■— Tr / 



Le Tr p$Lg \ 2 



es - log<^ eo (s) = es - log(l + ^-Q^s 2 ) = es - ^c^ s 2 = fs —\ + — . 

1 1 1 \ c e,e J Zc e,e 

The above convergence is uniform for the neighborhood of 6q. Taking the limit 9 — > 9q, 
we have 

Thus, we can check ([53]) and the strong consistency in the neighborhood of 9q. ■ 
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However, this sequence of estimators Mf depends on the true parameter 9q. We 
should construct a sequence of estimators that satisfies the strong consistency condition 
and attains the bound ^ at all points 9 . Since such a construction is too difficult, 
we introduce another strong consistency condition that is weaker than the above and 
under which inequality (|45| ) holds. We construct a sequence of estimators that satisfies 
this strong consistency condition and attains the bound given in P5| ) for all 9 in a weak 
sense. 

[Second strong consistency condition]: A sequence of estimators M = {M n } is 
called second strongly consistent if there exists a sequence of functions {fi m (M, 9, e)}^ =1 
such that 

• lim lim -,3 (M,9,e) = a(M,9). 

• lim— (3 (M,9,e) < a(M,9) holds. Its LHS converges locally uniformly to 9. 

e~^o e — m 

• Vm, 35 > s.t. §{M, 9, e) > §_ m {M, 9, e), for 5 > Ve > 0. 

Similarly to Theorem 2, we can prove inequality ( fS>D under the second strong consistency 
condition. 

Under these preparations, we state a theorem with respect to the attainability of 
the bound Jg. The following theorem can be regarded as a special case of Theorem 8 of 



Theorem 11 Under Assumptions 1 and 3, the sequence of estimators M| = {M^' 71 }^^ 
(defined in the following) satisfies the second strong consistency condition and the 
relations 

a(Mt,9) = a(M°,9) = (l-5)^. (55) 

The sequence of estimators M| is independent of the unknown parameter 9. Every Mg' n 
is an adaptive estimator and will be defined in section |^. 

Its proof is given in [Appendix H. 



[Assumption 3]: The following set is compact. 

f J ,7. Tr rinl ^ \ 2 

VOee 





If the state family is included by a bounded closed set consisting of positive definite 
operators, Assumption 3 is satisfied. 

[Construction of M|]: We perform a faithful POVM Mf (defined in the following) 
for the first 5n systems. Then, the data (ui, . . . , Us n ) obey the probability family 
{P B f \9 E 6}. We denote the maximum likelihood estimator (MLE) w.r.t. the data 
(u>i, . . . ,usn) by 9. Next, we perform the measurement E(L§) defined by the spectral 
measure of Lg for other (1 — 5)n systems. Then, we have data (uj$ n +i, . . . ,u n ). We 
decide the final estimated value as 

1 n 

Tr ^=(T^ 22 <*■ 

y ' i=Sn+l 
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Definition 12 A POVM M is called faithful, if the map p £ 5(H) i-> is one-to-one. 



An example of faithful POVM, which is a POVM taking values in the set of pure states on 
TL, is given by M h (dp) := kpv{ dp) , where v is the invariant (w.r.t. the action of SU(7Y)) 
probability measure on the set of pure states on 7i. As another example, if L 1; . . . L k 2_ 1 
is a basis of the space of self-adjoint traceless operators, a disjoint random combination 
of PVMs E(Lx), . . . E(Lj s 2_i) is faithful. Note that a disjoint random combination is 
defined in section [|. 

Remark 3 By dividing n systems into y/n and n — ^/n systems, Gill and Massar [[UJ 



constructed an estimator which asymptotically attains the optimal bound w.r.t. MSE, 



and Hayashi and Matsumoto [|38| constructed a similar estimator by dividing them into 
b n and n — b n systems, where lim — = 0. However, in our proof, it is difficult to show 



the attainability of the bound (45) in such a division. Perhaps, there may exist a family 



in which such an estimator does not attain the bound ([45]). At least, it is essential in 
our proof that the number of the first group b n satisfy lim ^ > 0. 

Conversely, as is mentioned in Theorems and 13, by dividing n systems into \fn 



and n — ^/n systems, we can construct an estimator attaining the bound (|34| ) at one 
point. 

We must use quantum correlations in the quantum apparatus to achieve the bound 
4r- The following theorem can be easily extended to the multi-parameter case. 

Theorem 13 We assume Assumption 1 and that D{pgi\\pg 1 ) < oo for V '6 'i, W £ O. 
Then, for any 9\ £ 0, the sequence of estimators = {M^' n }™ =1 satisfies the weak 
consistency condition j[3li), and the equations 

£(M£,0i,e) =P(M%A,e)= mf {£(p^K)||0i - 0'| > e}, (56) 
a(M-A) =«(M-, #!) = %• (57) 

The sequence of estimators depends on the unknown parameter 9\ but not on e > 0. 
Its proof is given in Appendix jj . In the following construction, M^' n is constructed from 



in 



the PVM E^, which is defined from a group-theoretical viewpoint in Definition [29 
[Appendix J.3| . 

[Construction of M^' n ]: We divide the n systems into two groups. We perform a 
faithful POVM My for the first group of systems. Then, the data (u%, . . . ,u 



obey the probability Pg If . We let 9 be the MLE of the data (ui, . . . ,uj/^) under the 
probability family {P f \9 £ ©}. Next, we perform the correlational PVM Eq~^™ for 
the composite system which consists of the other group of n — \fn systems. Then, the 

data oj obeys the probability P/ 1 . If e^ 1 -' 5 ™-^^^?^ 1 (w) > P/ 1 (uo), the 
estimated value T n is decided to be 9\, where 5 n := -\. If not, T n is decided to be 9. ■ 

n5 
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The following lemma proven in [Appendix J| plays an important role in the proof of 



Theorem 13 



Lemma 14 For three parameters 9q, 9\ and 62 and 5 > 0, the inequalities 
P^ j-i logP^ 1 H + Tr Peo logp, 2 > 6 

((k + 1) log(n +1) 
sup (5 - Trp do log p d2 )t - t logTrp^p^ - * ) (58) 
o<t<i n 

\-logP^(u)-Trp eo logp ei >d 

I 71 



< exp -n ( sup (5 + Tr p 9o log p 01 )t - logTr p 0o p t 6i J 
\o<t J 



(59) 



hold. 

We obtain the following theorem as a review of the above discussion. 

Theorem 15 From Theorems^, and[7J and Lemma \T(\, we have the equations 

sup limsup \p(M, 6, e) = sup liminf ^-(3{M, 9, e) = 



sup liminf -i/?(M, 0, e) = — (61) 
M: SC ate e ~ 2 

as an operational comparison of Jg and Jg under Assumptions 1, 2 and 3. We can 
replace (3(M,9,e) with f3(M,9,e) in equations 

We can also prove fl3"0|) as a consequence of equations ( j60| ) and (|6T|). 



8. Adaptive estimators 

In this section, we assume that the dimension of the Hilbert space Ji is finite. We 
consider estimators whose POVM is adaptively chosen from the data. We choose the 
Z-th POVM Miiu^t) on H from Z - 1 data := (u u . . . ,w f _i). Its POVM M n is 
described by 

M n (cJ n ) := Mi(wi) <g> M 2 (wi; w 2 ) ® • • • ® M n (<3 n _i; w n ). (62) 

In this setting, the estimator is written as the pair S n = (M n ,T n ) of the POVM M n 
satisfying and the function T n : Q n i— ► O. Such an estimator £ n is called an adaptive 
estimator. As a larger class of POVMs, the separable POVM is well known. A POVM 
M n on 7-t® n is called separable if it is written as 

M n = {Mi(u) <g> ■ ■ ■ <g> Mniu)}^ 

on 7i® n , where M^oS) is a positive semi-definite operator on TC. For any separable 
estimator (M n ,T n ), the relations 

u>eCil'=i lu=i t\ ) 
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wenz'=i i=i ^ 17 

n 

= \^D M ^{9\\9')<n sup D M {9\\9') (63) 

Af:POVM on n 

hold, where the POVM Mq \ on 7i is defined by 

M g>l (u) := a^t^M^u;), a^(cu) := I J^Tc p e M v {u) 

Theorem 16 If a sequence of separable estimators M = {£ n } = {{M n ,T n )} satisfies 
the weak consistency condition, the inequalities 

P(M,0i,e)< inf sup D M (9\\9 l ) (64) 

|e-9i|>e M:P0 VM on h 

a(M,9 1 ) < J -f (65) 

hold. 

Proof: Similarly to fl3~5|), the monotonicity of quantum relative entropy yields 



logPf {\T n (u) n ) - 9 1 \ > e } ^"(011^) + h(P B ) 



n nP n 

where P n := Pg /n {|T n (a; n ) — 9\\ > e}. From the weak consistency, we have P n — > 1. 
Thus, we obtain ([64]) from (|63|). Since 7i is finite-dimensional, the set of extremal points 
of POVMs is compact. Therefore, the convergence lim e ^ jiD M (9i + e||#i) is uniform 
w.r.t. M. This implies that 

lim^ sup D M {9 1 +e||0i) = sup lim \d m {9 x + e||6»i) = (66) 
e ^ 0e A/POVMonw mPOVM onw^ 0e 2 

The last equation is derived from (p9[). ■ 

The preceding theorem holds for any adaptive estimator. As a simple extension, we can 

define an m-adaptive estimator that satisfies (|62|) when every Mj(uJj_i) is a POVM on 

7i m . As a corollary of Theorem [16], we have the following. 

Corollary 17 If a sequence of m-adaptive estimators M = {S n } = {(M n ,T n )} satisfies 
the weak consistency condition, then the inequalities 

P(M,6 1} e)< inf sup — D M (9\\9 1 ) (67) 

|tf-0i|> eM:PO VM on n®™ m 

a(M, 6x) (68) 

hold. 
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22 



Je 
2 



Inn Inn sup ^(3M,9,e) = (69) 
The part of > holds because an adaptive estimator attaining the bound is constructed 



in Theorem [11], and the part of < follows from (|67| ) and the equation 

1 



lim 



sup 

mTOVM on u®™ e " m 



-D 



Mi 



sup lim — — 

a/:POVM on h®™ e ~*° e m 



D 



M, 



4K 
4K 



Jo 



which is proven in a similar manner as fl66|) 



9. Difference in order among limits and supremums 
Theorem [15| yields another operational comparison as 

sup liminf \(3(M,9,e) = — 
M-. SC at e e 2 



lim — sup 

6 M: SC at 



P(M,9,e) 



2 



(70) 



(71) 



Equation (|7Cf) equals (qI]) and equation (ffll) follows from Theorem [18|. Therefore, the 
difference between 4r and 4r can be regarded as the difference in the order of lim inf 



and sup^. gQ. This comparison was naively discussed by Nagaoka |22|, |23fl . 

Theorem 18 We adopt Assumption 1 in Theorem\T^ and D{p0i\\pQ 1 ) < oo for\/9' e 0. 
For any 5 > 0, there exists a sequence M^' S = {M^' S,n } of m-adaptive estimators 
satisfying the strong consistency condition and the inequality 



-1 



lim — logP "° {|0-0„| >4 



n^oo nm 



> {i-8)ird{D{e\\e )\\e-e Q \ > e } 



(l-8)(k- l)log(m+i; 



m 



However, using Theorem ITR we obtain a stronger equation than (|7T|): 



lim lim 



sup 



(72) 



e ~*° m -*°° M:m- ASC at e 6 "' 2 
where m-ASC at 9 denotes m-adaptive and is strongly consistent at 9. This equation is 



in contrast with (|B"9"D. Of course, the part of < for ( [72] ) follows from (|5T[) . The part of 
> for is derived from the above theorem. 



The following two lemmas are essential for our proof of Theorem [18 . 
Lemma 19 For two parameters 9\ and 9q, the inequality 

mD(9 \\9 1 ) - (k — 1) log(m + 1) < £> B £(0 o ||0i) < mD(9 \\9 1 



(73) 



holds, where the PVM on Ji®™- is defined in Appendix J.S . It is independent of 9q. 
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This lemma was proven by Hayashi p7[ and can be regarded as an improvement of Hiai 



and Petz's result [10]. However, Hiai and Petz's original version is sufficient for our 
proof of Theorem [H| For the reader's convenience, the proof is presented in Appendix 

Lemma 20 Let Y be a curved exponential family and X be an exponential family 
including Y . For a curved exponential family and an exponential family, see Chap 4 
in Amari and Nagaoka JJ/ or Barndorff -Niels en USQJ . In this setting, for n-i.i.d. data, 
the MLE Txni^ 12 ) f or the exponential family X is a sufficient statistic for the curved 
exponential family Y, where u n := (ui, . . . ,LU n ). Using the map T : X — > Y , we can 
define an estimator T o T^n> an d f or an estimator Ty, there exists a map T : X — ^ Y 
such that Ty = T o T-jf£. We can identify a map T from X to Y with a sequence of 
estimators T o T^^(u n ). We define the map T 9o : X — > Y as 

T eo := &rgmm{D(x\\6)\D(6\\6 ) < D(x\\6 )}. (74) 

When Y is an exponential family (i.e., flat), Tg coincides with the projection to Y. 
Then, the sequence of estimators corresponding to the map Tg satisfies the strong 
consistency at 9 Q and the equation 

lim — \ogpl{\\T do oT^(u n )-9 \\ >e} = M{D(9\\9 )\\\9-9 \\ > e} (75) 
holds 

Proof: It is well known that for any subset X' C X, the equation 

lim --\ogpl{T^(uj n ) G X'} = MD(x\\e ) (76) 



holds. For the reader's convenience, we present a proof of (|76|) in [Appendix K| . Thus, 



equation (|75|) follows from ([74]) and (0). If Y is an exponential family, then the 
estimator Tq o Tjf% coincides with the MLE and satisfies the strong consistency. 
Otherwise, we choose a neighborhood U of 6 so that we can approximate the 
neighborhood U by the tangent space. The estimator Tq o Tjr^ can be approximated 
by the MLE and satisfies the strong consistency at U. Thus, it also satisfies the strong 
consistency at 6 . ■ 
Proof of Theorem WR: Let M = {Mj} be a faithful POVM defined in section 772 such 



that the number of operators Mj is finite. For any m and any 5 > 0, we define the 
POVM to be the disjoint random combination ofMxm and with the ratio 
5:1 — 5. Note that a disjoint random combination is defined in section [|. From the 
definition of M^, the inequality 

(i - 8)D E ™ (e\\e) < D M ™o(e\\e) (77) 

holds. Since the map 9 \— > P^f is one-to-one, the map 9 i— > P e e ° is also one-to-one. Since 
M and E™ Q are finite-resolutions of the identity, the one-parameter family {P e \9 G 0} 
is a subset of multi- nominal distributions X, which is an exponential family. Applying 
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Lemma ^0], we have 

-1 M„ m xn 



= ±M{D M %(9\\9 )\\\e-9 \ >e} 

> S—^MfD^ieWOo) \9-9 \ >e) 
ml ) 

>{l-8) w£{D{0\%)\ \6 - 6»o | > e} - ^-^ k ~ l) l0g(m + 1} 



m 

where the first inequality follows from (|77|) and the second inequality follows from (|73|). 

■ 

Remark 4 In the case of the one-parameter equatorial spin 1/2 system state family, 
the map 6 i— > is not one-to-one. Therefore, we must treat not -E 1 ^ but M™. 

Conclusions 

It has been clarified that SLD Fisher information Jq gives the essential large deviation 
bound in the quantum estimation and KMB Fisher information Jq gives the large 
deviation bound of consistent superefficient estimators. Since estimators attaining 
the bound are unnatural, the bound 4r is more important from the viewpoint 
of quantum estimation than the bound On the other hand, as is mentioned 



in |Appendix A| , concerning a quantum analogue of information geometry from the 
viewpoint of e-connections, KMB is the most natural among the quantum versions 
of Fisher information. The interpretation of these two facts which seem to contradict 
each other, remains a problem. Similarly, it is a future problem to explain geometrically 
the relationship between the change of the orders of limits and the difference between 
the two quantum analogues of Fisher information. 
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Appendix A. Brief review of information-geometrical properties of Jq, Jq 
and Jq 

The quantum analogues of Fisher information Jq, Jq and Jq are obtained from the the 
inner products J p , J p and J p on the linear space consisting of self-adjoint operators: 

J P (A, B) := Tr AL B , [ p t L B p 1 - t dt = B 

Jo 
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Jp{A,B) := TtAL b , 1 -{L bP + P L B ) = B 
J P {A,B) :=Tr AL B , B = P L B 
in the following way: 

Je — Jp g 



Je — Jp g 



Je — Jpg 



dpe 


dpe 


de '■ 


de 


dpe 


dpe 


de '■ 


de 


dpe 


dpe 


de 


de 



In the multi-dimensional case, these are regarded as metrics as follows. For example, 
we can define a metrics 

<*■<%> = M 

on the tangent space at 9, and the RHS of ( |A.1| ) is called SLD Fisher matrix. 

In quantum setting, any information precessing is described by a trace-preserving 
CP (completely positive) map C : S(TC) — > S(TC'). These inner product satisfy the 
monotonicity: 



/ dpe_ dpe\ f dC(p e ) dC(p e ) \ 

pe \dB' d9 )- C{pe) \ d9 ' d9 J 
~ ( dp e dp e \ ~ 



dpe 


dp e 
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dp e 
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dp e 
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d9 



d9 


' d9 


dC(pe) 


dC(pe) 


d9 


' d9 


dC{ Pe ) 


dC(pe) 


d9 


' d9 



for a one-parametric density family {pe G S(H)\9 6 C These inequalities 

can be regarded as the quantum versions of (^). An inner product satisfying the above 
is called a monotone inner product. According Petz 0, the inner product J p is the 
maximum one among normalized monotone inner products, and the inner product J p is 
the minimum one. 

In the information geometry community, we usually discuss the torsion. As is 
known within this community, a-connection is a generalization of e-connection. The 
torsion of a-connection concerning Fisher inner product vanishes in any distribution 
familyl]. In quantum setting, we can define the e-connections with respect to several 
quantum Fisher inner products. One may expect that in a quantum setting, its torsion 
vanishes in any density family. However, for only the inner product J p , the torsion of e- 
connection vanishes in any density family Thus, KMB Fisher information seems the 
most natural quantum analogue of Fisher information, from an information-geometrical 
viewpoint. 
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Appendix B. Proof of ([15]) 



From (fn|), we can calculate as 



D(p e+e \\p e ) = Tr (p e+e (log p e+ e - logpe)) = Tr ( p e + — ej ( — ^— e + - — e 



dp e ^ rflogp i 1 d 2 \ogp e 2 



T r ,^) £ H-|T r |^) + lT r ( P ^))^. (B,) 



Next, we calculate the above coefficients 



o 



Tr ( p e L e ) = / Tr ( p< L e p^ ) eft = Tr ( ^ ) = 0. (B.2) 



Using ( |B.2| ) and (0), we have 

( ] ^ d 2 \ogp e \ = _d_ / / rflogp e \\ _ ^ ^ dp e d\og, 



dtP J dd \ V d9 J J V dd dd 

-^(^L e )=-J . (B.3) 



x dd 

From @n]) , O and (|B~3|) , we obtain 



Appendix C. Proof of Lemma |3] 

We define the unitary operator U e as 

b 2 (p e , Pe+€ ) = 2 (1 - Tr |Vp7V^I) = Tr(^P - V^U e ){^p - V^U e )*. 
Letting W(e) be y/p e+e U e , then we have 

6 2 (p e ,p, +e ) = Tt(W(0) - W(e))(W(0) - W(e)Y 

s Tr (" (0)£ ) (— <°) f ) ^^-s-m-s-mv. 

As is proven in the following discussion, the SLD L satisfies 
dW 1 

^-(0) = -LW(0). (C.l) 

Therefore, we have 

b 2 (p e ,p d+e ) 9* Trhw(0)W(0yLe 2 = -Tr L 2 p e. 

We obtain fl38|). It is sufficient to show ( |C1| ). 

From the definition of the Bures distance, we have 

b 2 (p e ,pe+e)= min Tx{^fp~ e - y/p e+£ U)(</pe~ - ^/peT £ U)* 

U: unitary 

= 2 - max Tr ^/p~ e ^/p e+e U* + U xfpe^t 
C/:unitary 

= 2 - Tr \^fp~e\/ Pe+e\ + W Pe+e^/pe\ 
= 2 - Tr (y/pe-y/poTcUie)* + U(e) y /p e T l 
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which implies that v /pe v /pe+7^(e)* = U{e)^pe+l^/p~Q. Therefore, W(0)W(e)* = 
W(e)W(0)*. Taking the derivative, we have 

dW dW 
W(0)^(0)* = _^(0)W(0)*, 

which implies that there exists a self-adjoint operator L such that 

Since p e+e = W(e)W(e)*, we have 

fyo) = \ (LW(0)W(0y + W(0)W(Q)*L) . 
Thus, the operator L coincides with the SLD. 



Appendix D. Proof of ( |43D 

Let M = {Mi} be an arbitrary POVM. We choose the unitary U satisfying 



Ua l/2 p l/2 = y/pWafV*. 
Using the Schwartz inequality, we have 



Pf (u;)VpFR = y Tr (Mt/ 2 aV*U*y (m^V/ 2 */*) y'Tr (m^V/ 2 )* (m^V /2 
> Tr {Ml' 2 a x ' 2 U*y (M^p 1 / 2 ) = \TiUa 1 ' 2 My 2 \ . 

Therefore, 



Tr Ua^Mup 1 



/2 



|Tr t/a 1/2 p 1/2 | = Tr ^p^ap 1 ' 2 . 



Appendix E. Proof of Lemma 



Let m and e be an arbitrary positive integer and an arbitrary positive real number, 
respectively. There exists a sufficiently large integer N such that 

- logPf" [\0-e\> -i] <-p(m, 8, -i] + e 



n ^ m ) — \ m 

-logP£i" a (|0- (0 + 5)| > -(m-i)j < -p (M,9 + 5,-{m-i) ) r 



for z = 0, ... ,tti and Vn > N. From the monotonicity fl42|) and the additivity (^) of 
quantum affinity, we perform the following evaluation: 

n^, ,, , 1 

ii,," ii ,i' j 



-^(a»IIpw) = -oW""" 



< io g (V {0 < of {e<e} i + pf" {0 + 5 < 0} 1 P J2 {0 + 5 < e] 
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i=l 



< 10KIP& 
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+ — (i-1) <0< 
m 



5 .1 5 



m 



O P^ 5 <!0 + -(7-i) <0<0 + -z 



5_A 5 

m 



- (0 + 5) 
0-0 



><5 +P 



>-*-l 

m 

5 



r 6>+<5 



0-0 
0- 



> 5 



> — ym — i) 
m 



6 -(6 + 5) >—(m-l)5} +P 



i=l 



m 



>-{i-r 

m 



0-0 



6»+<5 



> 5 



5 



7? 



5 



< log exp -- [p[ M,6,—(m- 1) -e + exp -- /3 M, + 5, 5 -e 



9 -(9 + 5) >— (m-i-1) 
m 

n ( „ / . 



5 



fi-r 



??? 



5 



Af,0 + <y,-(m-i-i; 
2 V— V m 



2 0<i<m V — V 



5 



< log(m + 2) exp -- min /3 M, 0, — (i - 1) + (3 [ M, 9 + 5, — (m - % - 1 



m 



5 



m 



2e 



7? 



2 \0<i<m — 



5 



log(m + 2) - - min (3 [M, 9, — (i - 1) + (3 [M, 9 + 5, — (m - i - 1) - 2e , 



777, 



5 



7» 



where we assume that P(M, 0, a) = for any negative real number a. Taking the limit 
n — > oo after dividing by n, we have 



2 0<i<m V — 



5 



77? 



8 



> - min [p [M, 9, — (i - 1) + P [M, 9 + 6, — (m - % - 1 



Since e > is arbitrary, the inequality 



2 0<i<m V — 



5 



777 



777 



5 



2e 



> - min [p[M,e,—(i-l))+P[M,9 + S,—(rn-i-l 



777 



holds. Taking the limit m —>■ oo, we obtain (|4 



Appendix F. Unitary evolutions on the boson coherent system 

In the system TC = L 2 (R), the unitary operator U\{0) := exp(/3a* — /3*a) acts on the 
coherent state as 

U x {P)\a) = \a-P), 

where a and P are complex numbers and a is the annihilation operator. Thus, we can 
verify that 

u^paU^py = Pa . p . 

Now, we let a, be the annihilation operator on the i-th system. The unitary operator 
U n (p) := n™ =1 exp(-pa* + p*Oi) acts on the system H m as 

u n (p) P f n u n (py = P %. 
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In the two-mode system the unitary V 2 (t) := expt( — a* 2 ai + a^a 2 ) acts as 

Vi(i)|o!i) ® \a 2 ) = |«i cost + a 2 sint) ® | — ai sint + a 2 cost). 
Thus, we can verify that 

Vi{t)p 01 ®pe 2 Vi{t)* = pOi cos t+6» 2 sin t ® P— 0i sin t+02 cost- 
Therefore, the unitary V n := n*=i ex P^i( — + satisfies 

V nPe V n — <&> Po 

where cos tj = J sin tj = 

Appendix G. Proof of Proposition |8| 

For a proof of Proposition we need the following lemma. 

Lemma 21 Let g n (u),f n (u) be functions on Q. Assume that the functions (3i{uo) : = 
lim^oo — log/ n (a;) and (5 2 {ui) := lim^^ — logg n (uj) are continuous. If the inequality 
gn{^) < 1 holds for any element u 6 Q and any positive integer n, and if there exists a 
subset K gVL such that 

lim — log ( / f n (u) dw}> min (Pi{uj) + (3 2 (u)) , 



the relation 



lim — log ( / f n (uj)g n (uj) duA = min (Pi(uj) + (3 2 (u)) 



holds. 



Similarly to Lemma f|, Lemma ^ is proven 
Now, we will prove Proposition 

Po = ^E k (^imwehave 



Now, we will prove Proposition |8|. From the definition of M w,n and the equation 

k 



-rr-z v k 



fc >ne 2 

where [ ] is a Gauss notation. Therefore, we obtain 



/5(M-,0,e) = e 2 log(l + ^ 



which implies ([50]). 

Next, we prove the strong consistency condition and floTp . We perform the following 
calculation: 

P m w ' n { Tn - 9 > e} = J2 (k\ [ ^\a)(a\e~ h ^ 1 d 2 a\k) 

k>{d+ ^n JC7CN 

= [^e-^ V !^e-HV a . (G.1) 



k>(e-e) 2 n 
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The equation 

-1. v 7 ^ -n\^£f \a-6\ 2 
hm — log^=e n = — = — (G.z) 

holds. Also, as is proven in the latter, the equations 

Urn— Wl V i^B! e -H 2 



-iog( E 

\fe>(e+ e ) 



n— >oo 77, \ ^-^ k\ 

" \-e) 2 n 

+ ef log + |cf - (61 + e) 2 ^) 1((9 + e) 2 - |«| 2 ) (G.3) 

a / 



lim — log ( 

n^oo -n, 1 ' ^ 

\fc<(6>-6 



(n|a| 2 )\ _ nH 2 



e) 2 n 

e) 2 log^2 + |«| 2 - (0 - e) 2 ") - e) 2 + |a| 2 ) (G.4) 

ar / 



hold, where l(x) is defined as 
l(x) = 



1 x > 
x < 0. 



For any 5 > 0, there exists a real number such that 

\. ( t ypR ( \a-d\ 2 \ ,\ K-6 
lim log / exp — n — = — ctx = — = — > 0. 

n-»oc n 6 VyH>^vr^ H \ N J J N 

Now, we can apply Lemma [Zl] to (p.lj) . From ( |G.2| ) and ( p.3| ), the relations 
lim- logVr" {T n -e >6} 

= m in ( J^!! + ((9 + ef log + \a\ 2 - (9 + e) 2 ^) + e) 2 - \a\ 

aec \ N \ \a.\ J 

min f tj=^ + 6(9 + ef log ^±£>! + | a | 2 - (9 + ^) 1((9 + e) 2 - |a| 



iV V I" 

™« I AT + ( ( * + bg W$ + {9 - S? - (9 + ef ) ^ + ^ " ( ^ " S 
hold. If e is sufficiently small for 9, we have the following approximation: 

lim — log Pf {T n - 9 > e\ ^ min _ s =e + = T . 

™ n 9 1 1 - iV \ 1 + 2AT ; A^ + i 

Thus, 

lim lim — 2 \og?r n {T n -9>e} = =±—. (G.5) 

The second convergence of the LHS of ( |G.5| ) is uniform in a sufficiently small 
neighborhood Uq of arbitrary 9 G M + \ {0}. 

Similarly to ( |G.5| ), from ( |G.4| ), we can prove 



2 



hm lim -^logPf *""{T„ - < -e} = =^- f . (G.6) 
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Also, the second convergence of the LHS of ( |G.6| ) is uniform at a sufficiently small 
neighborhood Ug of arbitrary 9 G 1R + \ {0}. Thus, ( pT]) and the strong consistency 
condition are proven. 

Next, we prove ( |G.3| ) and ( p.4| ). Using the Stirling formula, we have 

Hm — log ( n l"l 2 ) [fal e -n|«l a = ( 6 i og JL + \ a \ -A 1(6 -\a\ 2 ). (G.7) 
n^oo n [on\\ \ \a\ z J 

Since the relations 

H^-^\ - n ^, V W)* ^^i^g^m-i) 
{[{9-efn\-l)\ - k< ^ €)2n k] - e) nl (i(e - eyn] - 1)\ 6 

hold, O follows from (|077D . If (0 + e) 2 < |a| 2 , the equation 



|2U 

,12 



lim — log Y (W 7 ; e-"l"l 2 = (G.8) 

n->oo 77, fe! 

fc>(6»+e) 2 n 

holds. It implies ( |G.3[ ) in the case of (6* + e) 2 < |a| 2 . 

Next we prove ( |G.3[ ) in the case of (0 + e) 2 > |a| 2 . In this case, we have 

E V*-** * <L - (S + e « (G.9) 

Ln>fc>(6M-e) 2 n ' 

because (iH|pV"H 2 ) / ( (n| ff e~"H 2 ) = If L and N are sufficiently large 

for \a\ 2 , we have 

E ^-*H= < E e - = ( G ,0) 

k>Ln k>Ln 

because ( |G.7|) implies that 

(nH 2 )^ e _„ H2 < e _ [5n] w >L y n> N _ 



[5n] 

Since the relations 

(n|al a )K^) a "i M , v (n|a| a )* - nM 
\(6 + e) 2 n]\ ~ 2s k\ 



2 



< n{L _ + e) 2 ) l X' J .„ - e-H- + 



2 e 



-nL 



[(# + e) 2 n]! 1-e" 

hold, we have 



O 2 lotf^±!£+|a| 2 -(fl 



2\fc 

J__-n|a| a 



,. — 1 , / 

> hm — log > — 

n->oo n I * — ' fc! 

\fc>(0+e) 2 n 

> m in|^ + e ) 2 log^^ + |a| 2 -(^ + e ) 2 ) ,L\. 
If we let L be a sufficiently large real number, we have (|G.3| ). 
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Appendix H. Proof of Theorem ITT 



In this proof, we use the function <j) e g(s) defined in (|K.1| ). First, we prove the following 
four facts. 

(i) The faithful POVM Mf satisfies the inequalities 

P(Mf,9,e) > 0, a(M f ,9) > 0. 

(ii) The relation 



limlTrpW^- TiPdLt 



Jo 



Jo 



-1 



Jo, \/6e<d 



holds, 
(iii) The equation 



lim 



WW - 1 1 



J A 



(H.l) 



holds. The LHS converges uniformly w.r.t. 9, 9. 

(iv) For any real number 82 > 0, there exists a sufficiently small real number e > 
such that if | Tr pgL§ — Tr pe'L§\ < e(l — 82) and \9 — 9\ < y 7 ?, then \9' — 9\ < e. 

Fact (i) is easily proven from the definition of Mf. Fact (iii) is proven by the relation 
\L$ Trp e L § 



sup 



< 00. 



J§ J§ 
Fact (ii) is, also, proven by the relations 



Tr, 



Lq_ _ Tr p g L 6 



Ja 



Trpe(Ll) (Trp^ 



Jx 



Ja 1 as 9^ 9. 



Fact (iv) follows from the relation 
<9Tr p e Ls 



06 



1 as 9 -> 9, 



which follows from fact (i). 

Next, we prove the theorem from the preceding four facts. The inequality 



<Pf fXSn {9eU e>Vi } sup P 



LgX (1— 5)n 



{9^U e , e } + Pf fXSn {9^U e ^} 



eeu. 



(H.2) 



holds. As is proven in the latter, the inequality 



lim inf log sup P 



L 6 x(l-S)n trpn 



n—>oo fl 
( 



> (1-% 



V 



,1 

2 



5 2 (l-5 2 ) 2 -|Trp fl |^ 



J± TrpeLt 
Ja 



Ja 



-1 



(H.3) 
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holds, where the function g(x,y) is defined as g(x,y) := x — log(l + | +y). Therefore, 
we have 

/3(M§,e,e) = liminf — logPf* {9 £ U e ^} 



n— >oo Tl 



.2 M cn 2 1 ^ Trp LA 2 \ 1 e 2 (l-5 2 ) 2 



> min^ (1-5)/. | e 2 (l-5 2 f-|^Tr p e ^-^pj ) , ^ - ^ J | , 

•0({MfX6n},9,Ve)\. (H.4) 



From facts (i) and (ii), the equations 
lim .-(RHS of (H.4)) 

e^O e 2 



-1 



1-5 



((l-5 1 ) 2 (l-5 2 ) 2 J,-(l-5 2 ) 2 5 3 ) (H.5) 

hold. The RHS of (pT . 5|) converges locally uniformly w.r.t. 9. Let /3 (M|, 0, e) be the 
RHS of ( |H.4|) in the case of <5 2 = 63 = — . Therefore, we have 



lim limi/3 m (M|,e,e) = i-^J e 



which implies that 



«(M|,#)>^U. 



If the converse inequality 



a(M|,0)<— J* (H.6) 



holds, we can immediately derive relations ( p5|) and show that the sequence of estimators 
Mg satisfies the second strong consistency condition. 

In the following, the relations ([H . 6|) and ( |H.3|) are proven. First, we prove (|H.6|) . 



M 3,n 

We can evaluate the probability P e 4 {9 G Ug te j as 
-logPf n {9 G [/ e ,J = -log / pf' xfa (d0)Pj« xMn {7? g 



< _ J pff X5n (d9)\0g (p^-6)n {T n ^ ^ } 

< - / P/ (t£0) -j- 



P 



where P e ^ en := P^+^ en ^ an d similarly to (pop, we can prove the last 

inequality. For any <5 4 > 0, we have 



limsup--logPf l {Tn^,J 

n^oo Tl 
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< limsup / p^ /X<5n ( d6){l — 5) min 



1 - 8)D L 6{6 + £e||0) + fc(Pfl +^ J 



€=i-*,-(i-*) (1 - 5)P^ e n 

= (1-5) min D^i(e + ee||fl) = ^J„. 

£=l-<5 4 ,-(l-<S 4 ) 2 

The last equation is derived from Lebesgue's convergence theorem and the fact that the 
probability P g ^ en tends to 1 uniformly w.r.t. 9, as follows from Assumptions 1 and 3. 

The reason for the applicability of Lebesgue's convergence theorem is given as 
follows. Since P,9+g eri tends to 1 uniformly w.r.t. 9, there exists N, R > such that 
Pg+ ?e ,„ > ]=;, V# G 0, n > iV. Thus, we have 

1 9+ ^ n> < - ((1 - 5)D(6 + e£ 9) + 2) < oo. 

P » 1—0 

Therefore, we can apply Lebesgue's convergence theorem. Thus, the relations 
a(M s s ,6) = limsuplimsup-^logPf 1 !^ i U e , e } 

e^O n^oo Tl€ 

< (1-5) limsup 4 min D L H6 + £e||0) 
e 2 e=i-<54-(i-5 4 ) 

= (l-5)(l-5 4 ) 2 ^J e 

hold. Since 54 > is arbitrary, the inequality (|H.6| ) holds. 

Next, we prove the inequality (|H.3| ). Assume that \6 — 9\ < e and define 

A(£,0,0) := supfof-log^efo)). 



Then, the inequalities 

V L d ^ 5)n {6 £ Ue, e } < P^ x(1 ^ )n {|Tr P§ L § - Tr Pe L § \ < (1 - 5 2 )e} (H.7) 
< 2 exp (-(1 - o>min {A((l - 6 2 )e, 9, 6), A(-(l - 5 2 )e, 0, 0)})L8) 

hold, where ( [H.7[ ) is derived from fact (iv), and ( |tl.<j| ) is derived from Markov's inequality. 
Thus, 

lim-Ilog sup Po SX{1 ~ S)n {9<£U 6!e } 



6&U. 



6, -/i 



>(l-5) inf min{A((l-(J 2 )e,0,0),A(-(l-(J 2 )e,0,0)}. (H.9) 



We let e > be a sufficiently small real number for arbitrary £3 > and define 77 by 



Then, the inequalities 

A(±(l-5 2 )e,0,0) 
> ±(l-5 2 )e(±r ] )-\og^(±r ] ) 



2- - 1 
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2 N -1 



t\l-5f I / _ (L 6 Trp,L^ 2X 1 



log | 1 + v . 7 | I Trp, - I + <M I (H.10) 

hold, where ( |H.10| ) follows from fact (iii). The uniformity of ( |H . 1| ) (the fact(iii)) and 
the boundedness of RHS of ( |H.1| ) (Assumption 3) guarantee that the choice of e > 



is independent of 9,9. From ( |H.9| ) and ( |H.1Q| ), we obtain ( |H.4j ) because the function 
x h-> g(x, y) where y, x > 0. 
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If the true state is pg 1 , the inequalities 

< pjf' x ^{0 ^ c^j sup e n(i-s n -^)»(em)p^ {uj) < P ^ (w) 



< 1 x sup e~" (1 " <5 "-^ )D(e " l|9l) 
hold. Since (1 — 5 n _^) — > 1, we have 



lim— logP,/ 1 {T n ^ C/^J = inf D(0||0i). 

Thus, equation (|56| ) is proven. Then, it implies (0). 

Next, we show the weak consistency of M^. Assume that the true state pe is not 
Pe 1 . Then, we have 

M w ' n 

P e 61 {T n i Ug i£n } 

< Pf^{9 i U , €n } 

+ pMf^ { e eUeen} sup pfl ) e n(l-S n ^)D m e l)p % ( W )> P ^ (w) 1(11) 

where e n := -i — Z3 ( 6 'll 6> i) x Since 5 n = -t-, the convergence P.^ /X ^{(9 ^ 

2 | Tr ^s( lo SPe- lo gPei) 

Ug ten } — > holds. Also, the relation [/6», en C Ue,e n _^ holds. If we can prove 



8£Ug 

we obtain 



sup P? 1 ^-W6\\e l)v E H {u) > p ^ (w) j ^ 0> (L2) 



M w ' n 

P/ 1 {T n g C/ , e J - 0. (1.3) 



This condition ( p3|) is stronger than the weak consistency condition. Thus, it is sufficient 
to show (|L2|). 
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From Lemma IH1 the relations 



E n I 1 / E n E n \ 

p/ 1 1- (- log p/ 1 ( W ) + log P ei ei MJ + > <W0||0O 

P/ 1 1 - (- log P fl - 91 H + log P ei fll (u))+Trp e (log p 6 - log p fll ) 
> 5 n D{e\\9 x ) + Tr(p e - pg)(\ogp § - logp ei 



s n r 1 E n 

< P/ 1 |--logP/» + Trp e logp^ > 5 n D{9\\d x ) + Tr(p e - p § )(logp § - logp^ 

E n f 1 E n I 

+ P/ 1 j-logP^H " Tr p e logp ei > <f n £>(0||0i) + Tr(p, - pg)Qogp 6 - log^jj 

< exp — I n sup (5 n £>(0||6>i) + Tr(p e - p e -)(logpg - logpej - Trp^logpg) t 

V o<t<i 

4 (fe + l)log(n + l) . 
-t logTrpep, 



+ exp - ( nsup (5 n D(8\\6i) + Tr(p e - p§)(\ogp§ - logp 01 ) + Trp logp ei ) t - logTrp^p^ 



(1.4) 



hold. In the following, we assume that \0 — 6\ < e n . Since e n = -1 — J> ( 6) ll 6 'i) x^ we 

2 | Tr ^#"( lo g P0- lo gPfli)| 

can derive 5 n D(^||^i) + Tr(p e -pg)(logp e --logp ei ) < \D{B\\e 1 )8 n + 0{8l). Substituting 
t = s5 n , we have 

SU P ~H \ n SU P ( S nD{9\\6i) + Tr(p e - p -)(logp e - - \ogp dl ) - Tr p e log p s )t 
eau e ,, n nd n V o<t<i 

(fc + l)tog(n+l) _ t 

- i log Tr p e p - 

n y 

> sup 1 f (^(g||gi)<?n + 0(0 - Tr p e log P6 )s6 n - s5 n {k + 1} l0g(n + 1} 
+ Tr p e logp -S(5 n - ^(Tr p e (logpe) 2 - (Tr p e \ogp § ) 2 )s 2 5 2 n + 0(5*] 

> SUP l(WK+o«)-^ ( * +1)lort " +1) 

- i(Tr p e (logpe) 2 - (Trp e logpg) 2 )^ + C>(C 
-> - - (Trp e (logp e ) 2 - (Tr p e \ogp e ) 2 ) s 2 ( as n -> 00) 



~ (Trp e (logp e ) 2 - (Tr p e logp e ) 2 ) 



2(Trp e (logp e ) 2 - (Trp e logp e ) 2 ) 
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, m\Oi) 2 

8(Trp e (logp e ) 2 -(Trp e logp e ) 2 )- 
Thus, we have 

lim sup — ( n sup {5 n D{9\\9 x ) + Tr(p - p § )(logp§ - logpej - Trp e log p§)t 

feU B , tn n °n V 0<t<l 



"— ^ ,: , - n5 2 

4 (fc + l)log(n + l) _ t 
i log Tr p e p. 



n 

, o ( i5) 

" 8(Trp,(logp,) 2 -(Trp,logp e ) 2 ) ' K '> 
Also, we obtain 

sup — — ( nsup(5 n D(0||0i) + Tr(p e - pg)(logp e ~ - logp^J + Tr p e logp 01 )t - logTr 



> sup — ( (-D(9\\9 1 )6 n + 0(6l) + Tr p e log p 6l ) sS n - Tr p e log p ei s6 n 
SeU e , tn °n V 1 

- l(Trp e (logp ei ) 2 - (Trp e log p 9l ) 2 )sX + 0(% 

= _sup 1 (Ql^)* - \ (Tr p,(logp ei ) 2 - (Trp,logp ei ) 2 ) s 2 ) 5 2 n + 0(<5*)) 
1 i 

-> -D(0||0i)s - -(Trp e (logp ei ) 2 - (Trp e logp e J 2 )s 2 (asn^ oo). 
Therefore, 



1 

9eC/ fl , e „ '^n 



lim sup — — nsnp(5 n D(9\\9 1 ) + Tr(p e - p § )(logp§ - logpej + Trp e logp ei )t - logTr 

H-KXJ^,, 7l<)„ \ 0<t 



> « >o. 

8(Trp (logp ei ) 2 - (Trp e logp 01 ) 2 ) 
Since n<5 2 — > oo, relation ( p2|) follows from (p^),( p5|) and ( |L6l) . 



Appendix J. Pinching map and group theoretical viewpoint 

Appendix J.l. Pinching map in non- asymptotic setting 

In the following, we prove Lemma [H] and construct the PVM Eg after some discussions 



concerning the pinching map in the non-asymptotic setting and group representation 
theory. In this subsection, we present some definitions and discussions of the non- 
asymptotic setting. 

A state p is called commutative with a PVM E(= {Ei}) on 7i if pE{ = Eip for any 
index i. For PVMs E(= {£7j} ie j), F(= {Fj}j & j), the notation E < F means that for any 
index i £ I there exists a subset (F/E)i of the index set J such that Ei = J2je(F/E)i 
For a state p, we denote by E(p) the spectral measure of p which can be regarded as a 
PVM. The pinching map Se with respect to a PVM E is defined as 

Ss'.p^^EipEi, (J.l) 
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which is an affine map from the set of states to itself. Note that the state £e{p) 
is commutative with a PVM E. If a PVM F = {Fj}j £ j is commutative with a PVM 
E = {Ejjjg/, we can define the PVM FxE = {FjEi}^ e j x j, which satisfies FxE > E 
and F x E > F. For any PVM E, the supremum of the dimension of E^ is denoted by 
w{E). 

Lemma 22 Let E be a PVM such that w(E) < oo. If states a and p are commutative 
with the PVM E, and if a PVM F satisfies E < F, E(a) < F, then we have 

D(p\\a) - logw(E) < D(£ F (p)\\£ F (a)) < D{p\\a). 
This lemma follows from Lemma |23| and Lemma [24] below. 
Lemma 23 Let p and a be states. If a PVM F satisfies E(a) < F , then 

D(p\\a) = D(£ F (p)\\£ F (a))+D(p\\£ F (p)). (J.2) 

Proof: Since E(a) < F and F is commutative with a, we have Tr £ F (p) log £ F (o~) = 
Trploga. Since p is commutative with logp, we have Tr£p(p)logp = Trplogp. 
Therefore, we obtain the following: 

D(£ F {p)\\£ F (a)) - D(p\\a) = Tr £ F (p) (log £ F (p) - log £ F (a)) - Tr p (logp - log a) 

= Tr£ F (p)(\og£ F (p) - logp). 

This proves ( |J.2| ). ■ 

Lemma 24 Let E and F be PVMs such that E < F. If a state p is commutative with 
E, we have 

D(p\\£ F (p))< logw(E). (J.3) 

Proof: Let <2j := Tr EipEi and pi := —EipE^. Then, we have p = X]j a iP«' 
£f(p) = Y.i a i £ F(pi), Y.i a i = L Therefore, 

D(p\\£ F {p)) =J]Tr£ i p(logp-log£ F (p)) = £ Tr E.pE^E, log pE { - E { \og£ F (p)E i 

i i 

= } j a i D(pi\\£ F (pi)) < swp D(pi\\£ F (pi)) = sup (Tr pi log p { - Tr£ F (pi) log £ F {pi)) 

i i 

i 

< — sup Tr £ F (pi) log £ F (pi) < suplogdimi^j = \ogw(E). 

i i 

Thus, we obtain inequality ( |J.3[ ). ■ 
Let us consider another type of inequality. 

Lemma 25 Let E be a PVM such that w(E) < oo. If the state p is commutative with 
E, and if a PVM M satisfies that M > E , we have 

P < £m(p)w(E) (J.4) 
p- % > £ M (p)- t w(E)- t (J.5) 

for 1 < t < 0. 
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Proof: It is sufficient for ( |J.4[ ) to show 

P < k£ M (p), (J.6) 
for any state p and any PVM M on a A;-dimensional Hilbert space 7i. Now, it is sufficient 



to prove ( |J.6| ) in the pure state case. For any <fi,ip <EH, we have 

k 

(i;\k£ M (\<f>)(<f>\) - = kY,(i>\Mi\4>)(<P\Mi\i 



i=l 



i=l 



> 0. 



The last inequality follows from Schwartz inequality for vectors {('0|-^i|0)}i=i an d 
{l}f =1 . It is well known that the function u \— > —if 1 (0 < t < 1) is an operator 
monotone function | 40| . Thus, (|J.4|) implies ( |J.5|) . ■ 

Lemma 26 // a PVM M is commutative with a state a and w(M) = 1, we have 

Pf {logPf (w) > a} < exp (-sup (at - logTrpa')^ (J.7) 
for any state p. 

Proof: From Markov's inequality, we have 

p{X>a}< exp-At(X,p,a) (J.8) 
A t {X,p,a) :=at-\og J e tx{w) p{duo). 

Since w(M) = 1, the relation £ w Pf (cu)Pf (w)* = Tr £ m {p)£m{^Y holds. It yields 

At (log Pf , Pf , a) = at - log Tr £ M {p)£ M {a) t = at - log Tr pa*. 
Thus, we obtain (|J.7|). ■ 

Lemma 27 Assume that E and M are PVMs such that w(E) < oo, w(M) = 1 and 
M > E. If the states p and p' are commutative with E, we have 



Pf {-logPf H > a) < exp ( - sup ((a - log w{E))t - logTrp// 4 

V o<t<i V 



(J.9) 



Proof: If < t < 1, we have 



A t (- log Pf , Pf , a) = at - log Tr £ m (p)£m(p' 



>\-t 



at - logTi p£ M (p' 



>\-t 



i-t 



> at — \ogw{E) tr Ti pp 

> (a — log w{E))t — log Tr pp t , 



(J.10) 
(J.ll) 



where (|J.10|) follows from Lemma Therefore, from ( |J.8|) and (|J.11|) , we obtain ( |J.9| 
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Appendix J. 2. Group representation and its irreducible decomposition 

In this subsection, we consider the relation between irreducible representations and 



PVMs for the purpose of constructing the PVM Eg and a proof of Lemma [14]. Let 
V be a finite-dimensional vector space over the complex numbers C A map 7r from a 
group G to the generalized linear group of a vector space V is called a representation 
on V if the map 7r is homomorphic, i.e., 7i(gi)7i(g 2 ) = n(gig 2 ), 1,(72 £ G. The 
subspace W of is called invariant with respect to a representation 7r if the vector 
ii(g)w belongs to the subspace W for any vector w G W and any element g E G. The 
representation 7r is called irreducible if there is no proper nonzero invariant subspace 
of V with respect to tt. Let 7Ti and 7T2 be representations of a group G on Vi and 
V 2 , respectively. The tensored representation 7Ti © 7r 2 of G on V\ © V 2 is defined as 
(tti <S> 1*2) (g) = ^1(9) © ^(g), and the (izrect swm representation 7Ti © 7r 2 of G on Vj © V2 
is also defined as (tti © ^2) (5) = tti(<7) © ^(flO- 

In the following, we treat a representation 7r of a group G on a finite-dimensional 
Hilbert space 7i. The following fact is crucial in later arguments. There exists an 
irreducible decomposition H — Hi © • • • ®Hi such that the irreducible components are 
orthogonal to one another if for any element g G G there exists an element g* G G such 
that 7c(g)* = 7f{g*), where ir(g)* denotes the adjoint of the linear map 7f(g). We can 
regard the irreducible decomposition H = Hi © ■ ■ • © Hi as the PVM {pHi}i=i, where 
P-Hi denotes the projection to Hi. If two representations 7i"i and n 2 satisfy the preceding 
condition, the tensored representation m © tt 2 also satisfies it. Note that in general, an 
irreducible decomposition of a representation satisfying the preceding condition is not 
unique. In other words, we cannot uniquely define the PVM from such a representation. 

Appendix J. 3. Construction of PVM Eg and the tensored representation 

In this subsection, we construct the PVM Eg after the discussion of the tensored 
representation. Let the dimension of the Hilbert space H be k. Concerning the natural 
representation 7t S l(-h) of the special linear group SL(7^) on H, we consider its n-th 
tensored representation tt®^^ := 7Tsl(w) © • • ■ © ttsl(W) 011 the tensor product space 

n 

H® n . For any element g G SL(7i), the relation ttsl(m)G7)* = ^sl(H)(9*) holds where the 
element g* G SL(H) denotes the adjoint matrix of the matrix g. Consequently, there 
exists an irreducible decomposition of Trfj™^) regarded as a PVM and we denote the set 
of such PVMs by Ir® n . 

From Weyl's dimension formula ((7.1.8) or (7.1.17) in Weyl pl|and Goodman and 



Wallach the n-th symmetric tensor product space is the maximum- dimensional 

space in the irreducible subspaces with respect to the n-th tensored representation 
n sh(H)' ^ S dimension equals the repeated combination k H n evaluated by k H n = 
= = n+ i# fc _i < (n + l) fc_1 . Thus, any element E n G Ir® n satisfies: 

w(E n ) <(n + l) k ~ 1 . (J.12) 
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Lemma 28 A PVM E n G j r ® n i s commutative with the n-th tensor product state p® n 
of any state p onH. 

Proof: If det p 7^ 0, this lemma is trivial based on the fact that det(p) _1 p G SL(7i). If 
det p = 0, there exists a sequence {pi}^ such that det p« 7^ and p, L — > p as i — > 00. 
We have pf n — > p®" as z — > 00. Because a PVM E n G Ir® n is commutative with pf n , 
it is also commutative with p® n . ■ 

Definition 29 We can de/me tfie PVM E n x £(p® n ) /or any PVM E n G ir®". Now 
we define the PVM Eg satisfying w{E%) = 1, Eg > E n x E(pf n ) for a PVM E n G Ir® n . 
Note that the Eg is not unique. 

Proof of Lemma J^: From Lemmas |26] and |27, (|J.12| ) and the definition of Eg, we 
obtain Lemma [14]. ■ 
Proof of Lemma \TQ: From Lemma |22|, ( |J.12|) and the definition of Eg, we obtain Lemma 
[191- ■ 



Appendix K. Large deviation theory for an exponential family 

In this section, we review the large deviation theory for an exponential family. A d- 
dimensional probability family is called an exponential family if there exist linearly 
independent real- valued random variables Fx, . . . , Fj and a probability distribution p on 
the probability space Q such that the family consists of the probability distribution 



p e { duo) : = exp ^ PF^uo) - V(0)j p( duo) 
:= log J exp [£*Fm) P(duo). 



i=i 



In this family, the parametric space is given by := {6 G R |0, < V(^) < the 
parameter 9 is called the natural parameter and the function ip(6) is called the potential. 
We define the dual potential (f)(8) and the dual parameter f](8), called the expectation 
parameter, as 



Vi(0) :=^- = log J^F t (uo)p e ( duo) 
^):=max(^^(0)-^)]. 



. i=l 



From ( |K.1| ), we have 

d 

= j2e\(8)-m- 



i=l 
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In this family, the sufficient statistics are given by Fi(u), . . . , F^(u). The MLE 9{uj) is 
given by r)i{9{u)) = F^u). The KL divergence D(9\\9 ) := D(p g \\p 9o ) is calculated by 

D(9\\9 )= [ l g^P-pe(du)= f T(9* - ^(u,) + ^(9 ) - du) 

= - okH + vw - m = m + woo) - 

i i 

= rnaxE(^-^%(e)-log^exp f ^(C* - ^(w) j W (dw). 

Next, we discuss the n-i.i.d. extension of the family {po\9 G @}. For the data 
uj n := (uji, . . . , ui n ) G Q n , the probability distribution Pg(uj n ) := . . .pg(uj n ) is 

given by 

Pe(u n ) = exp fn E 9 l F n)i (uo n ) - n^{9)\ p n ( du n ) 

p n {duj n ) := p( dux) ...p( du n ) 
1 - 

F n ,i{u n ) := -) Fi(uj k ). 
fc=i 

Since the expectation parameter of the probability family {pg\9 G 0} is given by ni]i(9), 
the MLE # n (u; n ) is given by 

nr)i{9 n {u n )) = nF nii (uj n ). (K.l) 

Applying Cramer's Theorem |36j to the random variables . . . , F^ and the distribution 
pe , for any subset S C M d we have 



inf sup ro"^ - E^CFO) - 4(0 



< lim — log^ { J P n eS'} 



< 

rjeintS 



inf sup ( J2 dH (Vi ~ Eflb(F,)) - 4(0') ) 



where 



Efl (F f )) := / Fi{uj)pe{duj) 
Jn 

4(0) := ^£exp ( Y^PFtiu) ) Pe (du) 



F n (u n ) :— (F nt i(u n ), . . . ,F n) d(uj n )), 
and intS" denotes the interior of S, which is consistent with (S c ) c . Since 



sup (yvfo-E^))-^^)) = ^p [y;^ 

0'eK d \ i J e>m d \ i 



r]l (9 ))-^(9'))+^(9 ) = D(9\\9 ) 
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and the map 6 \— > D(6\\6 ) is continuous, it follows from ( |K.1| ) that 



lim -l\ gpl{§ n G 9'} = miD(6\\9 ) 

n— >oo n 060' 



for any subset 0' C 0, which is equivalent to (|76|). Conversely, if an estimator {T n (u; n )} 
satisfies the weak consistency 

\imp^{\\T n (uj n )-e\\ >e}^0, Ve > O,V0 G 0, 

then, similarly to ([33|), we can prove 

l im z!io gp «{T n (tf ft ) G 0'} < inf D(0||0 O ). 

Therefore, we can conclude that the MLE is optimal in the large deviation sense for 
exponential families. 

Appendix L. Estimation of spectrum for unitary invariant family 

Suppose that a multi-parametric quantum state family S satisfies 

UpU* G S, for Vp G S, 

and that the vector p(p) = (pi(p), ■ ■ ■ ,Pd(p)) satisfies Pi(p) > Pi+i(p), where d is the 
dimension of Ti. Keyl and Werner's estimator M KW = {M^- L } satisfies 

— 1 1\/T n 

lim — logPjfe^p G 1Z} = inf D(p||p(p)), (L.l) 



where 1Z is a subset consisting of d-nomial distributions (g8|. Conversely, if a sequence 
of estimators M = {M n } satisfies 

P}£{||p-p(p)ll >e}-0, Ve>0,V P G5, 

then we can show 

lim sup — log Pi; {p G TZ} < inf D(a\\p) (L.2) 
n 1 p(o-)e7e 

by a similar way to (|33"D . Since 

min D(UaU*\\p) = D(p(a)\\p(p)), 
i/:umtary 

the RHS of ( |L.2| ) equals the RHS of ( |L.1| ). Therefore, Keyl and Werner's estimator 
Mkw is optimal in the sense of large deviation. Now, we consider a parametric subspace 
{p6»|# G 0} of (i-nomial distributions. Assume that p(p) = p<j , then 

lim , ~1, na ¥u D M\Pe ) = \ min Je-,i,Mj, ( L -3) 

e^U 6 \\0— 6q\\ >e Z ||£||=1 

where Je-ij is Fisher information matrix of {pe\6 G 0}. Since the convergence of the 
LHS of is uniform and the RHS of ( |h.3| ) is continuous for 6, the bound of the weak 



consistency coincides with the bound of the strong consistency. 
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